3,072 research outputs found

    An asymptotic theory for model selection inference in general semiparametric problems.

    Get PDF
    Recently, Hjort and Claeskens (2003) developed an asymptotic theory for model selection, model averaging and post-model selection/averaging inference using likelihood methods in parametric models, along with associated confidence statements. In this paper, we consider a semiparametric version of this problem, wherein the likelihood depends on parameters and an unknown function, and model selection/averaging is to be applied to the parametric parts of the model. We show that all the results of Hjort and Claeskens hold in the semiparametric context, if the Fisher information matrix for parametric models is replaced by the semiparametric information bound for semiparametric models, and if maximum likelihood estimators for parametric models are replaced by semiparametric efficient profile estimators. The results also describe the behavior of semiparametric model estimates when the parametric component is misspecified, and have implications as well for pointwise consistent model selectors.Aikake information criterion; Bayse information criterion; Behavior; Efficient semi-parametric estimation; Estimator; Frequentist model averaging; Implications; Information; Matrix; Maximum likelihood; Methods; Model; Model averaging; Model selection; Models; Problems; Profile likelihood; Research; Selection; Semiparametric model; Theory;

    Testing Hardy-Weinberg equilibrium with a simple root-mean-square statistic

    Full text link
    We provide evidence that, in certain circumstances, a root-mean-square test of goodness of fit can be significantly more powerful than state-of-the-art tests in detecting deviations from Hardy-Weinberg equilibrium. Unlike Pearson's χ2 test, the log-likelihood-ratio test, and Fisher's exact test, which are sensitive to relative discrepancies between genotypic frequencies, the root-mean-square test is sensitive to absolute discrepancies. This can increase statistical power, as we demonstrate using benchmark data sets and simulations, and through asymptotic analysis. © 2013 The Author 2013. Published by Oxford University Press

    Bounded Influence Regression in the Presence of Heteroskedasticity of Unknown Form

    Get PDF
    In a regression model with conditional heteroskedasticity of unknown form, we propose a general class of M-estimators scaled by nonparametric estimates of the conditional standard deviations of the dependent variable. We give regularity conditions under which these estimators are asymptotically equivalent to M-estimators scaled by the true conditional standard deviations. The practical performance of these estimators is investigated through a Monte Carlo experiment

    A simultaneous confidence band for sparse longitudinal regression

    Full text link
    Functional data analysis has received considerable recent attention and a number of successful applications have been reported. In this paper, asymptotically simultaneous confidence bands are obtained for the mean function of the functional regression model, using piecewise constant spline estimation. Simulation experiments corroborate the asymptotic theory. The confidence band procedure is illustrated by analyzing CD4 cell counts of HIV infected patients

    Model averaging based on Kullback-Leibler distance

    Full text link
    © 2015, Institute of Statistical Science. All rights reserved. This paper proposes a model averaging method based on Kullback-Leibler distance under a homoscedastic normal error term. The resulting model average estimator is proved to be asymptotically optimal. When combining least squares estimators, the model average estimator is shown to have the same large sample properties as the Mallows model average (MMA) estimator developed by Hansen (2007). We show via simulations that, in terms of mean squared prediction error and mean squared parameter estimation error, the proposed model average estimator is more efficient than the MMA estimator and the estimator based on model selection using the corrected Akaike information criterion in small sample situations. A modified version of the new model average estimator is further suggested for the case of heteroscedastic random errors. The method is applied to a data set from the Hong Kong real estate market

    Semiparametric Bayesian analysis of gene-environment interactions with error in measurement of environmental covariates and missing genetic data

    Full text link
    Case-control studies are widely used to detect geneenvironment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development

    Spatial regression with covariate measurement error: A semiparametric approach

    Get PDF
    © 2016, The International Biometric Society. Spatial data have become increasingly common in epidemiology and public health research thanks to advances in GIS (Geographic Information Systems) technology. In health research, for example, it is common for epidemiologists to incorporate geographically indexed data into their studies. In practice, however, the spatially defined covariates are often measured with error. Naive estimators of regression coefficients are attenuated if measurement error is ignored. Moreover, the classical measurement error theory is inapplicable in the context of spatial modeling because of the presence of spatial correlation among the observations. We propose a semiparametric regression approach to obtain bias-corrected estimates of regression parameters and derive their large sample properties. We evaluate the performance of the proposed method through simulation studies and illustrate using data on Ischemic Heart Disease (IHD). Both simulation and practical application demonstrate that the proposed method can be effective in practice

    Rapid publication-ready MS-Word tables for one-way ANOVA

    Get PDF
    © 2014, Assaad et al.; licensee Springer. Conclusions: Our new and user-friendly software to perform statistical analysis and generate publication-ready MS-Word tables for one-way ANOVA are expected to facilitate research in agriculture, biomedicine, and other fields of life sciences.Background: Statistical tables are an important component of data analysis and reports in biological sciences. However, the traditional manual processes for computation and presentation of statistically significant results using a letter-based algorithm are tedious and prone to errors.Results: Based on the R language, we present two web-based software for individual and summary data, freely available online, at http://shiny.stat.tamu.edu:3838/hassaad/Table_report1/ and http://shiny.stat.tamu.edu:3838/hassaad/SumAOV1/, respectively. The software are capable of rapidly generating publication-ready tables containing one-way analysis of variance (ANOVA) results. No download is required. Additionally, the software can perform multiple comparisons of means using the Duncan, Student-Newman-Keuls, Tukey Kramer, and Fisher’s least significant difference (LSD) tests. If the LSD test is selected, multiple methods (e.g., Bonferroni and Holm) are available for adjusting p-values. Using the software, the procedures of ANOVA can be completed within seconds using a web-browser, preferably Mozilla Firefox or Google Chrome, and a few mouse clicks. Furthermore, the software can handle one-way ANOVA for summary data (i.e. sample size, mean, and SD or SEM per treatment group) with post-hoc multiple comparisons among treatment means. To our awareness, none of the currently available commercial (e.g., SPSS and SAS) or open-source software (e.g., R and Python) can perform such a rapid task without advanced knowledge of the corresponding programming language

    Variogram estimation in the presence of trend

    Full text link
    Estimation of covariance function parameters of the error process in the presence of an unknown smooth trend is an important problem because solving it allows one to estimate the trend nonparametrically using a smoother corrected for dependence in the errors. Our work is motivated by spatial statistics but is applicable to other contexts where the dimension of the index set can exceed one. We obtain an estimator of the covariance function parameters by regressing squared differences of the response on their expectations, which equal the variogram plus an offset term induced by the trend. Existing estimators that ignore the trend produce bias in the estimates of the variogram parameters, which our procedure corrects for. Our estimator can be justified asymptotically under the increasing domain framework. Simulation studies suggest that our estimator compares favorably with those in the current literature while making less restrictive assumptions. We use our method to estimate the variogram parameters of the short-range spatial process in a U.S. precipitation data set

    How to estimate the measurement error variance associated with ancestry proportion estimates

    Full text link
    To show how the variance of the measurement error (ME) associated with individual ancestry proportion estimates can be estimated, especially when the number of ancestral populations (k) is greater than 2. We extend existing internal consistency measures to estimate the ME variance, and we compare these estimates with the ME variance estimated by use of the repeated measurement (RM) approach. Both approaches work by dividing the genotyped markers into subsets. We examine the effect of the number of subsets and of the allocation of markers to each subset on the performance of each approach. We used simulated data for all comparisons. Independently of the value of k, the measures of internal reliability provided less biased and more precise estimates of the ME variance than did those obtained with the RM approach. Both methods tend to perform better when a large number of subsets of markers with similar sizes are considered. Our results will facilitate the use of ME correction methods to address the ME problem in individual ancestry proportion estimates. Our method will improve the ability to control for type I error inflation and loss of power in association tests and other genomic research involving ancestry estimates
    corecore